14 research outputs found
WikiM: Metapaths based Wikification of Scientific Abstracts
In order to disseminate the exponential extent of knowledge being produced in
the form of scientific publications, it would be best to design mechanisms that
connect it with already existing rich repository of concepts -- the Wikipedia.
Not only does it make scientific reading simple and easy (by connecting the
involved concepts used in the scientific articles to their Wikipedia
explanations) but also improves the overall quality of the article. In this
paper, we present a novel metapath based method, WikiM, to efficiently wikify
scientific abstracts -- a topic that has been rarely investigated in the
literature. One of the prime motivations for this work comes from the
observation that, wikified abstracts of scientific documents help a reader to
decide better, in comparison to the plain abstracts, whether (s)he would be
interested to read the full article. We perform mention extraction mostly
through traditional tf-idf measures coupled with a set of smart filters. The
entity linking heavily leverages on the rich citation and author publication
networks. Our observation is that various metapaths defined over these networks
can significantly enhance the overall performance of the system. For mention
extraction and entity linking, we outperform most of the competing
state-of-the-art techniques by a large margin arriving at precision values of
72.42% and 73.8% respectively over a dataset from the ACL Anthology Network. In
order to establish the robustness of our scheme, we wikify three other datasets
and get precision values of 63.41%-94.03% and 67.67%-73.29% respectively for
the mention extraction and the entity linking phase
Logarithmic or algebraic: roughening of a generalised Kardar-Parisi-Zhang equation
We show that a nearly phase-ordered two-dimensional (2D) active XY model on a
substrate, or a nearly flat active interface that follows a generalised
Kardar-Parisi-Zhang (KPZ) equation can be stable in some parameters regimes. In
these regimes, the phase fluctuations of the XY model or the interface
conformation fluctuations can exhibit a sub-logarithmic or a super-logarithmic
roughness. Specifically, an interface of lateral size L, or 2D active XY model
on a substrate of linear size L, respectively, will undulate over a typical
size or display typical angular fluctuations of size , where
for sub-(super-)logarithmic roughness and a is a microscopic cutoff.
This generalise the well-known quasi-long range order of the 2D equilibrium XY
model at low temperatures, implying less or more rough than 2D Edward-Wilkinson
surfaces. In other parameter regimes, there is only short range phase-order, or
an algebraically rough interface.Comment: Preliminary version, 5+4 page
Same but Different: Distant Supervision for Predicting and Understanding Entity Linking Difficulty
Entity Linking (EL) is the task of automatically identifying entity mentions
in a piece of text and resolving them to a corresponding entity in a reference
knowledge base like Wikipedia. There is a large number of EL tools available
for different types of documents and domains, yet EL remains a challenging task
where the lack of precision on particularly ambiguous mentions often spoils the
usefulness of automated disambiguation results in real applications. A priori
approximations of the difficulty to link a particular entity mention can
facilitate flagging of critical cases as part of semi-automated EL systems,
while detecting latent factors that affect the EL performance, like
corpus-specific features, can provide insights on how to improve a system based
on the special characteristics of the underlying corpus. In this paper, we
first introduce a consensus-based method to generate difficulty labels for
entity mentions on arbitrary corpora. The difficulty labels are then exploited
as training data for a supervised classification task able to predict the EL
difficulty of entity mentions using a variety of features. Experiments over a
corpus of news articles show that EL difficulty can be estimated with high
accuracy, revealing also latent features that affect EL performance. Finally,
evaluation results demonstrate the effectiveness of the proposed method to
inform semi-automated EL pipelines.Comment: Preprint of paper accepted for publication in the 34th ACM/SIGAPP
Symposium On Applied Computing (SAC 2019